A #hashtagtokenizer for Social Media Messages
نویسندگان
چکیده
In social media, mainly due to length constraints, users write succinct messages and use hashtags to refer to entities, events, sentiments or ideas. Hashtags carry a lot of content that can help in many tasks and applications involving text processing such as sentiment analysis, named entity recognition and information extraction. However, identifying the individual words of a hashtag is not trivial because the traditional POS taggers typically consider it as a single token, despite the fact that it might contain multiple words, e.g. #fergusondecision, #imcharliehebdo. In this work, we propose a generic model for hashtagtokenisation that aims to split up one hashtag into several tokens corresponding to each individual word contained in it (e.g. “#imcharliehebdo” would become four tokens, “#”, “i”, “am” and “Charlie Hebdo”). Our hashtagtokenizer is based on a machine learning segmentation method for Chinese language and makes also use of Wikipedia as encyclopedic knowledge base. We have evaluated the inference power of our approach by comparing the tokens produced by our approach to those produced by human taggers. The results demonstrated the good accuracy and applicability of the proposed model for general-purpose applications.
منابع مشابه
Mass Media vs. the Mass of Media: A Study on the Human Nodes in a Social Network and their Chosen Messages
In Internet-based social networks, the nodes have the most pivotal role in the processes and outcomes of the networks. Whether they pay attention to a message in the network or ignore it defines the fate of the message. One message is shared and re-shared by millions of users and another is left forgotten. The current study tries to shed light on one aspect of the role of the users in a social ...
متن کاملEffectiveness of Media Intervention on Students' Attitudes toward Drug and Tobacco: Based on Health Education and Legal Consequences
Background and purpose: Drug and tobacco addiction is one of the major threats to adolescents and educating this group could be of great benefit in preventing the problem. The purpose of this study was to investigate the effectiveness of media intervention on students' attitude towards drug and tobacco use. Materials and methods: In this quasi-experimental research a male high school was ra...
متن کاملUsing Machine Learning Algorithms for Automatic Cyber Bullying Detection in Arabic Social Media
Social media allows people interact to express their thoughts or feelings about different subjects. However, some of users may write offensive twits to other via social media which known as cyber bullying. Successful prevention depends on automatically detecting malicious messages. Automatic detection of bullying in the text of social media by analyzing the text "twits" via one of the machine l...
متن کاملNew Media and Peacemaking Diplomacy
A life full of peace has always been a dream for human societies. Peace is still considered as one of the main modern world concerns, because each nation deals with its own kind of violence and war. Therefore, peacemaking has turned into one of the governments’ objectives. Peace has been affected by various factors in all over history. Media and communication technologies are two of the factors...
متن کاملThe Necessity of Media Knowledge Education to Students to Promote their Media Literacy Competency
The paper designs an indigenous module to upgrade the media literacy among high school students utilizing the “multimedia education”, “the cultural studies”, the theory of “New London Group” and “the political media literacy” by Ferguson. The method used is in two categories, documentary and survey. The population comprises of the tenth and eleventh graders in the 2nd region of the ministry of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Int. J. Comput. Linguistics Appl.
دوره 6 شماره
صفحات -
تاریخ انتشار 2015